Overview

Brought to you by YData

Dataset statistics

Number of variables141
Number of observations707
Missing cells83352
Missing cells (%)83.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory778.9 KiB
Average record size in memory1.1 KiB

Variable types

Categorical19
Text4
Boolean4
Numeric10
Unsupported104

Alerts

body_site has constant value "stool" Constant
antibiotics_current_use has constant value "False" Constant
non_westernized has constant value "False" Constant
sequencing_platform has constant value "IlluminaHiSeq" Constant
population has constant value "Dutch" Constant
BMI is highly overall correlated with DNA_extraction_kit and 3 other fieldsHigh correlation
DNA_extraction_kit is highly overall correlated with BMI and 25 other fieldsHigh correlation
PMID is highly overall correlated with BMI and 25 other fieldsHigh correlation
age is highly overall correlated with DNA_extraction_kit and 4 other fieldsHigh correlation
age_category is highly overall correlated with DNA_extraction_kit and 18 other fieldsHigh correlation
birth_weight is highly overall correlated with DNA_extraction_kit and 11 other fieldsHigh correlation
born_method is highly overall correlated with DNA_extraction_kit and 9 other fieldsHigh correlation
breastfeeding_duration is highly overall correlated with DNA_extraction_kit and 9 other fieldsHigh correlation
country is highly overall correlated with DNA_extraction_kit and 21 other fieldsHigh correlation
curator is highly overall correlated with BMI and 25 other fieldsHigh correlation
days_from_first_collection is highly overall correlated with DNA_extraction_kit and 10 other fieldsHigh correlation
disease is highly overall correlated with DNA_extraction_kit and 19 other fieldsHigh correlation
disease_subtype is highly overall correlated with DNA_extraction_kit and 9 other fieldsHigh correlation
family_role is highly overall correlated with DNA_extraction_kit and 15 other fieldsHigh correlation
feeding_practice is highly overall correlated with DNA_extraction_kit and 11 other fieldsHigh correlation
fobt is highly overall correlated with DNA_extraction_kit and 4 other fieldsHigh correlation
formula_first_day is highly overall correlated with DNA_extraction_kit and 10 other fieldsHigh correlation
gender is highly overall correlated with birth_weightHigh correlation
gestational_age is highly overall correlated with DNA_extraction_kit and 9 other fieldsHigh correlation
infant_age is highly overall correlated with DNA_extraction_kit and 10 other fieldsHigh correlation
location is highly overall correlated with DNA_extraction_kit and 7 other fieldsHigh correlation
median_read_length is highly overall correlated with DNA_extraction_kit and 6 other fieldsHigh correlation
minimum_read_length is highly overall correlated with DNA_extraction_kit and 13 other fieldsHigh correlation
number_bases is highly overall correlated with DNA_extraction_kit and 5 other fieldsHigh correlation
number_reads is highly overall correlated with DNA_extraction_kit and 6 other fieldsHigh correlation
pregnant is highly overall correlated with DNA_extraction_kit and 14 other fieldsHigh correlation
study_condition is highly overall correlated with DNA_extraction_kit and 19 other fieldsHigh correlation
study_name is highly overall correlated with BMI and 25 other fieldsHigh correlation
median_read_length is highly imbalanced (73.4%) Imbalance
antibiotics_current_use has 626 (88.5%) missing values Missing
age has 626 (88.5%) missing values Missing
infant_age has 552 (78.1%) missing values Missing
NCBI_accession has 355 (50.2%) missing values Missing
pregnant has 436 (61.7%) missing values Missing
lactating has 707 (100.0%) missing values Missing
BMI has 627 (88.7%) missing values Missing
family has 436 (61.7%) missing values Missing
treatment has 707 (100.0%) missing values Missing
days_from_first_collection has 436 (61.7%) missing values Missing
family_role has 436 (61.7%) missing values Missing
born_method has 552 (78.1%) missing values Missing
feeding_practice has 556 (78.6%) missing values Missing
location has 626 (88.5%) missing values Missing
diet has 707 (100.0%) missing values Missing
travel_destination has 707 (100.0%) missing values Missing
visit_number has 707 (100.0%) missing values Missing
premature has 707 (100.0%) missing values Missing
birth_weight has 552 (78.1%) missing values Missing
gestational_age has 552 (78.1%) missing values Missing
antibiotics_family has 707 (100.0%) missing values Missing
disease_subtype has 352 (49.8%) missing values Missing
days_after_onset has 707 (100.0%) missing values Missing
creatine has 707 (100.0%) missing values Missing
albumine has 707 (100.0%) missing values Missing
hscrp has 707 (100.0%) missing values Missing
ESR has 707 (100.0%) missing values Missing
ast has 707 (100.0%) missing values Missing
alt has 707 (100.0%) missing values Missing
globulin has 707 (100.0%) missing values Missing
urea_nitrogen has 707 (100.0%) missing values Missing
BASDAI has 707 (100.0%) missing values Missing
BASFI has 707 (100.0%) missing values Missing
alcohol has 707 (100.0%) missing values Missing
flg_genotype has 707 (100.0%) missing values Missing
population has 352 (49.8%) missing values Missing
menopausal_status has 707 (100.0%) missing values Missing
lifestyle has 707 (100.0%) missing values Missing
body_subsite has 707 (100.0%) missing values Missing
uncurated_metadata has 707 (100.0%) missing values Missing
tnm has 707 (100.0%) missing values Missing
triglycerides has 707 (100.0%) missing values Missing
hdl has 707 (100.0%) missing values Missing
ldl has 707 (100.0%) missing values Missing
hba1c has 707 (100.0%) missing values Missing
change_in_tumor_size has 707 (100.0%) missing values Missing
RECIST has 707 (100.0%) missing values Missing
ORR has 707 (100.0%) missing values Missing
smoker has 707 (100.0%) missing values Missing
ever_smoker has 707 (100.0%) missing values Missing
dental_sample_type has 707 (100.0%) missing values Missing
history_of_periodontitis has 707 (100.0%) missing values Missing
PPD_M has 707 (100.0%) missing values Missing
PPD_B has 707 (100.0%) missing values Missing
PPD_D has 707 (100.0%) missing values Missing
PPD_L has 707 (100.0%) missing values Missing
fobt has 626 (88.5%) missing values Missing
disease_stage has 707 (100.0%) missing values Missing
disease_location has 707 (100.0%) missing values Missing
calprotectin has 707 (100.0%) missing values Missing
HBI has 707 (100.0%) missing values Missing
SCCAI has 707 (100.0%) missing values Missing
mumps has 707 (100.0%) missing values Missing
cholesterol has 707 (100.0%) missing values Missing
c_peptide has 707 (100.0%) missing values Missing
glucose has 707 (100.0%) missing values Missing
creatinine has 707 (100.0%) missing values Missing
bilubirin has 707 (100.0%) missing values Missing
prothrombin_time has 707 (100.0%) missing values Missing
wbc has 707 (100.0%) missing values Missing
rbc has 707 (100.0%) missing values Missing
hemoglobinometry has 707 (100.0%) missing values Missing
FMT_role has 707 (100.0%) missing values Missing
subcohort has 707 (100.0%) missing values Missing
fmt_id has 707 (100.0%) missing values Missing
remission has 707 (100.0%) missing values Missing
dyastolic_p has 707 (100.0%) missing values Missing
systolic_p has 707 (100.0%) missing values Missing
insulin_cat has 707 (100.0%) missing values Missing
adiponectin has 707 (100.0%) missing values Missing
glp_1 has 707 (100.0%) missing values Missing
cd163 has 707 (100.0%) missing values Missing
il_1 has 707 (100.0%) missing values Missing
leptin has 707 (100.0%) missing values Missing
fgf_19 has 707 (100.0%) missing values Missing
glutamate_decarboxylase_2_antibody has 707 (100.0%) missing values Missing
HLA has 707 (100.0%) missing values Missing
autoantibody_positive has 707 (100.0%) missing values Missing
age_seroconversion has 707 (100.0%) missing values Missing
age_T1D_diagnosis has 707 (100.0%) missing values Missing
hitchip_probe_class has 707 (100.0%) missing values Missing
previous_therapy has 707 (100.0%) missing values Missing
performance_status has 707 (100.0%) missing values Missing
toxicity_above_zero has 707 (100.0%) missing values Missing
PFS12 has 707 (100.0%) missing values Missing
fasting_insulin has 707 (100.0%) missing values Missing
fasting_glucose has 707 (100.0%) missing values Missing
protein_intake has 707 (100.0%) missing values Missing
stec_count has 707 (100.0%) missing values Missing
shigatoxin_2_elisa has 707 (100.0%) missing values Missing
stool_texture has 707 (100.0%) missing values Missing
anti_PD_1 has 707 (100.0%) missing values Missing
ajcc has 707 (100.0%) missing values Missing
smoke has 707 (100.0%) missing values Missing
bristol_score has 707 (100.0%) missing values Missing
hsCRP has 707 (100.0%) missing values Missing
LDL has 707 (100.0%) missing values Missing
mgs_richness has 707 (100.0%) missing values Missing
ferm_milk_prod_consumer has 707 (100.0%) missing values Missing
inr has 707 (100.0%) missing values Missing
birth_control_pil has 707 (100.0%) missing values Missing
c_section_type has 707 (100.0%) missing values Missing
hla_drb12 has 707 (100.0%) missing values Missing
hla_dqa12 has 707 (100.0%) missing values Missing
hla_dqa11 has 707 (100.0%) missing values Missing
hla_drb11 has 707 (100.0%) missing values Missing
zigosity has 707 (100.0%) missing values Missing
brinkman_index has 707 (100.0%) missing values Missing
alcohol_numeric has 707 (100.0%) missing values Missing
breastfeeding_duration has 571 (80.8%) missing values Missing
formula_first_day has 555 (78.5%) missing values Missing
ALT has 707 (100.0%) missing values Missing
eGFR has 707 (100.0%) missing values Missing
sample_id has unique values Unique
number_reads has unique values Unique
number_bases has unique values Unique
lactating is an unsupported type, check if it needs cleaning or further analysis Unsupported
treatment is an unsupported type, check if it needs cleaning or further analysis Unsupported
diet is an unsupported type, check if it needs cleaning or further analysis Unsupported
travel_destination is an unsupported type, check if it needs cleaning or further analysis Unsupported
visit_number is an unsupported type, check if it needs cleaning or further analysis Unsupported
premature is an unsupported type, check if it needs cleaning or further analysis Unsupported
antibiotics_family is an unsupported type, check if it needs cleaning or further analysis Unsupported
days_after_onset is an unsupported type, check if it needs cleaning or further analysis Unsupported
creatine is an unsupported type, check if it needs cleaning or further analysis Unsupported
albumine is an unsupported type, check if it needs cleaning or further analysis Unsupported
hscrp is an unsupported type, check if it needs cleaning or further analysis Unsupported
ESR is an unsupported type, check if it needs cleaning or further analysis Unsupported
ast is an unsupported type, check if it needs cleaning or further analysis Unsupported
alt is an unsupported type, check if it needs cleaning or further analysis Unsupported
globulin is an unsupported type, check if it needs cleaning or further analysis Unsupported
urea_nitrogen is an unsupported type, check if it needs cleaning or further analysis Unsupported
BASDAI is an unsupported type, check if it needs cleaning or further analysis Unsupported
BASFI is an unsupported type, check if it needs cleaning or further analysis Unsupported
alcohol is an unsupported type, check if it needs cleaning or further analysis Unsupported
flg_genotype is an unsupported type, check if it needs cleaning or further analysis Unsupported
menopausal_status is an unsupported type, check if it needs cleaning or further analysis Unsupported
lifestyle is an unsupported type, check if it needs cleaning or further analysis Unsupported
body_subsite is an unsupported type, check if it needs cleaning or further analysis Unsupported
uncurated_metadata is an unsupported type, check if it needs cleaning or further analysis Unsupported
tnm is an unsupported type, check if it needs cleaning or further analysis Unsupported
triglycerides is an unsupported type, check if it needs cleaning or further analysis Unsupported
hdl is an unsupported type, check if it needs cleaning or further analysis Unsupported
ldl is an unsupported type, check if it needs cleaning or further analysis Unsupported
hba1c is an unsupported type, check if it needs cleaning or further analysis Unsupported
change_in_tumor_size is an unsupported type, check if it needs cleaning or further analysis Unsupported
RECIST is an unsupported type, check if it needs cleaning or further analysis Unsupported
ORR is an unsupported type, check if it needs cleaning or further analysis Unsupported
smoker is an unsupported type, check if it needs cleaning or further analysis Unsupported
ever_smoker is an unsupported type, check if it needs cleaning or further analysis Unsupported
dental_sample_type is an unsupported type, check if it needs cleaning or further analysis Unsupported
history_of_periodontitis is an unsupported type, check if it needs cleaning or further analysis Unsupported
PPD_M is an unsupported type, check if it needs cleaning or further analysis Unsupported
PPD_B is an unsupported type, check if it needs cleaning or further analysis Unsupported
PPD_D is an unsupported type, check if it needs cleaning or further analysis Unsupported
PPD_L is an unsupported type, check if it needs cleaning or further analysis Unsupported
disease_stage is an unsupported type, check if it needs cleaning or further analysis Unsupported
disease_location is an unsupported type, check if it needs cleaning or further analysis Unsupported
calprotectin is an unsupported type, check if it needs cleaning or further analysis Unsupported
HBI is an unsupported type, check if it needs cleaning or further analysis Unsupported
SCCAI is an unsupported type, check if it needs cleaning or further analysis Unsupported
mumps is an unsupported type, check if it needs cleaning or further analysis Unsupported
cholesterol is an unsupported type, check if it needs cleaning or further analysis Unsupported
c_peptide is an unsupported type, check if it needs cleaning or further analysis Unsupported
glucose is an unsupported type, check if it needs cleaning or further analysis Unsupported
creatinine is an unsupported type, check if it needs cleaning or further analysis Unsupported
bilubirin is an unsupported type, check if it needs cleaning or further analysis Unsupported
prothrombin_time is an unsupported type, check if it needs cleaning or further analysis Unsupported
wbc is an unsupported type, check if it needs cleaning or further analysis Unsupported
rbc is an unsupported type, check if it needs cleaning or further analysis Unsupported
hemoglobinometry is an unsupported type, check if it needs cleaning or further analysis Unsupported
FMT_role is an unsupported type, check if it needs cleaning or further analysis Unsupported
subcohort is an unsupported type, check if it needs cleaning or further analysis Unsupported
fmt_id is an unsupported type, check if it needs cleaning or further analysis Unsupported
remission is an unsupported type, check if it needs cleaning or further analysis Unsupported
dyastolic_p is an unsupported type, check if it needs cleaning or further analysis Unsupported
systolic_p is an unsupported type, check if it needs cleaning or further analysis Unsupported
insulin_cat is an unsupported type, check if it needs cleaning or further analysis Unsupported
adiponectin is an unsupported type, check if it needs cleaning or further analysis Unsupported
glp_1 is an unsupported type, check if it needs cleaning or further analysis Unsupported
cd163 is an unsupported type, check if it needs cleaning or further analysis Unsupported
il_1 is an unsupported type, check if it needs cleaning or further analysis Unsupported
leptin is an unsupported type, check if it needs cleaning or further analysis Unsupported
fgf_19 is an unsupported type, check if it needs cleaning or further analysis Unsupported
glutamate_decarboxylase_2_antibody is an unsupported type, check if it needs cleaning or further analysis Unsupported
HLA is an unsupported type, check if it needs cleaning or further analysis Unsupported
autoantibody_positive is an unsupported type, check if it needs cleaning or further analysis Unsupported
age_seroconversion is an unsupported type, check if it needs cleaning or further analysis Unsupported
age_T1D_diagnosis is an unsupported type, check if it needs cleaning or further analysis Unsupported
hitchip_probe_class is an unsupported type, check if it needs cleaning or further analysis Unsupported
previous_therapy is an unsupported type, check if it needs cleaning or further analysis Unsupported
performance_status is an unsupported type, check if it needs cleaning or further analysis Unsupported
toxicity_above_zero is an unsupported type, check if it needs cleaning or further analysis Unsupported
PFS12 is an unsupported type, check if it needs cleaning or further analysis Unsupported
fasting_insulin is an unsupported type, check if it needs cleaning or further analysis Unsupported
fasting_glucose is an unsupported type, check if it needs cleaning or further analysis Unsupported
protein_intake is an unsupported type, check if it needs cleaning or further analysis Unsupported
stec_count is an unsupported type, check if it needs cleaning or further analysis Unsupported
shigatoxin_2_elisa is an unsupported type, check if it needs cleaning or further analysis Unsupported
stool_texture is an unsupported type, check if it needs cleaning or further analysis Unsupported
anti_PD_1 is an unsupported type, check if it needs cleaning or further analysis Unsupported
ajcc is an unsupported type, check if it needs cleaning or further analysis Unsupported
smoke is an unsupported type, check if it needs cleaning or further analysis Unsupported
bristol_score is an unsupported type, check if it needs cleaning or further analysis Unsupported
hsCRP is an unsupported type, check if it needs cleaning or further analysis Unsupported
LDL is an unsupported type, check if it needs cleaning or further analysis Unsupported
mgs_richness is an unsupported type, check if it needs cleaning or further analysis Unsupported
ferm_milk_prod_consumer is an unsupported type, check if it needs cleaning or further analysis Unsupported
inr is an unsupported type, check if it needs cleaning or further analysis Unsupported
birth_control_pil is an unsupported type, check if it needs cleaning or further analysis Unsupported
c_section_type is an unsupported type, check if it needs cleaning or further analysis Unsupported
hla_drb12 is an unsupported type, check if it needs cleaning or further analysis Unsupported
hla_dqa12 is an unsupported type, check if it needs cleaning or further analysis Unsupported
hla_dqa11 is an unsupported type, check if it needs cleaning or further analysis Unsupported
hla_drb11 is an unsupported type, check if it needs cleaning or further analysis Unsupported
zigosity is an unsupported type, check if it needs cleaning or further analysis Unsupported
brinkman_index is an unsupported type, check if it needs cleaning or further analysis Unsupported
alcohol_numeric is an unsupported type, check if it needs cleaning or further analysis Unsupported
ALT is an unsupported type, check if it needs cleaning or further analysis Unsupported
eGFR is an unsupported type, check if it needs cleaning or further analysis Unsupported
days_from_first_collection has 69 (9.8%) zeros Zeros

Reproduction

Analysis started2025-03-30 01:12:46.396837
Analysis finished2025-03-30 01:12:55.540996
Duration9.14 seconds
Software versionydata-profiling vv4.16.1
Download configurationconfig.json

Variables

study_name
Categorical

High correlation 

Distinct3
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size5.7 KiB
VilaAV_2018
355 
YassourM_2018
271 
HanniganGD_2017
81 

Length

Max length15
Median length11
Mean length12.224894
Min length11

Characters and Unicode

Total characters8643
Distinct characters22
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowHanniganGD_2017
2nd rowHanniganGD_2017
3rd rowHanniganGD_2017
4th rowHanniganGD_2017
5th rowHanniganGD_2017

Common Values

ValueCountFrequency (%)
VilaAV_2018 355
50.2%
YassourM_2018 271
38.3%
HanniganGD_2017 81
 
11.5%

Length

2025-03-29T21:12:55.593953image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-03-29T21:12:55.645015image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
vilaav_2018 355
50.2%
yassourm_2018 271
38.3%
hannigangd_2017 81
 
11.5%

Most occurring characters

ValueCountFrequency (%)
a 788
 
9.1%
V 710
 
8.2%
0 707
 
8.2%
1 707
 
8.2%
_ 707
 
8.2%
2 707
 
8.2%
8 626
 
7.2%
s 542
 
6.3%
i 436
 
5.0%
A 355
 
4.1%
Other values (12) 2358
27.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 8643
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 788
 
9.1%
V 710
 
8.2%
0 707
 
8.2%
1 707
 
8.2%
_ 707
 
8.2%
2 707
 
8.2%
8 626
 
7.2%
s 542
 
6.3%
i 436
 
5.0%
A 355
 
4.1%
Other values (12) 2358
27.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 8643
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 788
 
9.1%
V 710
 
8.2%
0 707
 
8.2%
1 707
 
8.2%
_ 707
 
8.2%
2 707
 
8.2%
8 626
 
7.2%
s 542
 
6.3%
i 436
 
5.0%
A 355
 
4.1%
Other values (12) 2358
27.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 8643
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 788
 
9.1%
V 710
 
8.2%
0 707
 
8.2%
1 707
 
8.2%
_ 707
 
8.2%
2 707
 
8.2%
8 626
 
7.2%
s 542
 
6.3%
i 436
 
5.0%
A 355
 
4.1%
Other values (12) 2358
27.3%

sample_id
Text

Unique 

Distinct707
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size5.7 KiB
2025-03-29T21:12:55.807291image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length28
Median length28
Mean length17.659123
Min length7

Characters and Unicode

Total characters12485
Distinct characters19
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique707 ?
Unique (%)100.0%

Sample

1st rowMG100208
2nd rowMG100207
3rd rowMG100206
4th rowMG100205
5th rowMG100204
ValueCountFrequency (%)
mg100208 1
 
0.1%
egar00001763476_1000ibd00202 1
 
0.1%
mg100198 1
 
0.1%
mg100206 1
 
0.1%
mg100205 1
 
0.1%
mg100204 1
 
0.1%
mg100203 1
 
0.1%
mg100202 1
 
0.1%
mg100201 1
 
0.1%
mg100200 1
 
0.1%
Other values (697) 697
98.6%
2025-03-29T21:12:56.033307image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 3903
31.3%
1 1392
 
11.1%
3 771
 
6.2%
6 744
 
6.0%
G 707
 
5.7%
7 662
 
5.3%
2 552
 
4.4%
4 429
 
3.4%
D 355
 
2.8%
_ 355
 
2.8%
Other values (9) 2615
20.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 12485
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 3903
31.3%
1 1392
 
11.1%
3 771
 
6.2%
6 744
 
6.0%
G 707
 
5.7%
7 662
 
5.3%
2 552
 
4.4%
4 429
 
3.4%
D 355
 
2.8%
_ 355
 
2.8%
Other values (9) 2615
20.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 12485
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 3903
31.3%
1 1392
 
11.1%
3 771
 
6.2%
6 744
 
6.0%
G 707
 
5.7%
7 662
 
5.3%
2 552
 
4.4%
4 429
 
3.4%
D 355
 
2.8%
_ 355
 
2.8%
Other values (9) 2615
20.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 12485
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 3903
31.3%
1 1392
 
11.1%
3 771
 
6.2%
6 744
 
6.0%
G 707
 
5.7%
7 662
 
5.3%
2 552
 
4.4%
4 429
 
3.4%
D 355
 
2.8%
_ 355
 
2.8%
Other values (9) 2615
20.9%
Distinct516
Distinct (%)73.0%
Missing0
Missing (%)0.0%
Memory size5.7 KiB
2025-03-29T21:12:56.179058image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length19
Median length16
Mean length12.510608
Min length6

Characters and Unicode

Total characters8845
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique440 ?
Unique (%)62.2%

Sample

1st rowHanniganGD_2017_A29
2nd rowHanniganGD_2017_A28
3rd rowHanniganGD_2017_A27
4th rowHanniganGD_2017_A26
5th rowHanniganGD_2017_A25
ValueCountFrequency (%)
m0038c 5
 
0.7%
m0072c 5
 
0.7%
m0226c 5
 
0.7%
m1098c 5
 
0.7%
m0333c 5
 
0.7%
m0388c 5
 
0.7%
m0399c 5
 
0.7%
m0346c 5
 
0.7%
m0201c 5
 
0.7%
m0327c 5
 
0.7%
Other values (506) 657
92.9%
2025-03-29T21:12:56.383005image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 2400
27.1%
1 661
 
7.5%
_ 517
 
5.8%
D 436
 
4.9%
M 387
 
4.4%
s 355
 
4.0%
B 355
 
4.0%
I 355
 
4.0%
b 355
 
4.0%
u 355
 
4.0%
Other values (16) 2669
30.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 8845
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 2400
27.1%
1 661
 
7.5%
_ 517
 
5.8%
D 436
 
4.9%
M 387
 
4.4%
s 355
 
4.0%
B 355
 
4.0%
I 355
 
4.0%
b 355
 
4.0%
u 355
 
4.0%
Other values (16) 2669
30.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 8845
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 2400
27.1%
1 661
 
7.5%
_ 517
 
5.8%
D 436
 
4.9%
M 387
 
4.4%
s 355
 
4.0%
B 355
 
4.0%
I 355
 
4.0%
b 355
 
4.0%
u 355
 
4.0%
Other values (16) 2669
30.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 8845
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 2400
27.1%
1 661
 
7.5%
_ 517
 
5.8%
D 436
 
4.9%
M 387
 
4.4%
s 355
 
4.0%
B 355
 
4.0%
I 355
 
4.0%
b 355
 
4.0%
u 355
 
4.0%
Other values (16) 2669
30.2%

body_site
Categorical

Constant 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size5.7 KiB
stool
707 

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters3535
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowstool
2nd rowstool
3rd rowstool
4th rowstool
5th rowstool

Common Values

ValueCountFrequency (%)
stool 707
100.0%

Length

2025-03-29T21:12:56.446024image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-03-29T21:12:56.483432image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
stool 707
100.0%

Most occurring characters

ValueCountFrequency (%)
o 1414
40.0%
s 707
20.0%
t 707
20.0%
l 707
20.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3535
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 1414
40.0%
s 707
20.0%
t 707
20.0%
l 707
20.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3535
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 1414
40.0%
s 707
20.0%
t 707
20.0%
l 707
20.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3535
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 1414
40.0%
s 707
20.0%
t 707
20.0%
l 707
20.0%

antibiotics_current_use
Boolean

Constant  Missing 

Distinct1
Distinct (%)1.2%
Missing626
Missing (%)88.5%
Memory size1.5 KiB
False
81 
(Missing)
626 
ValueCountFrequency (%)
False 81
 
11.5%
(Missing) 626
88.5%
2025-03-29T21:12:56.505495image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

study_condition
Categorical

High correlation 

Distinct4
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size5.7 KiB
IBD
355 
control
299 
CRC
 
27
adenoma
 
26

Length

Max length7
Median length3
Mean length4.8387553
Min length3

Characters and Unicode

Total characters3421
Distinct characters15
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowadenoma
2nd rowadenoma
3rd rowadenoma
4th rowadenoma
5th rowadenoma

Common Values

ValueCountFrequency (%)
IBD 355
50.2%
control 299
42.3%
CRC 27
 
3.8%
adenoma 26
 
3.7%

Length

2025-03-29T21:12:56.551954image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-03-29T21:12:56.598682image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
ibd 355
50.2%
control 299
42.3%
crc 27
 
3.8%
adenoma 26
 
3.7%

Most occurring characters

ValueCountFrequency (%)
o 624
18.2%
I 355
10.4%
B 355
10.4%
D 355
10.4%
n 325
9.5%
c 299
8.7%
t 299
8.7%
r 299
8.7%
l 299
8.7%
C 54
 
1.6%
Other values (5) 157
 
4.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3421
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 624
18.2%
I 355
10.4%
B 355
10.4%
D 355
10.4%
n 325
9.5%
c 299
8.7%
t 299
8.7%
r 299
8.7%
l 299
8.7%
C 54
 
1.6%
Other values (5) 157
 
4.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3421
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 624
18.2%
I 355
10.4%
B 355
10.4%
D 355
10.4%
n 325
9.5%
c 299
8.7%
t 299
8.7%
r 299
8.7%
l 299
8.7%
C 54
 
1.6%
Other values (5) 157
 
4.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3421
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 624
18.2%
I 355
10.4%
B 355
10.4%
D 355
10.4%
n 325
9.5%
c 299
8.7%
t 299
8.7%
r 299
8.7%
l 299
8.7%
C 54
 
1.6%
Other values (5) 157
 
4.6%

disease
Categorical

High correlation 

Distinct4
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size5.7 KiB
IBD
355 
healthy
299 
CRC
 
27
adenoma
 
26

Length

Max length7
Median length3
Mean length4.8387553
Min length3

Characters and Unicode

Total characters3421
Distinct characters15
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowadenoma
2nd rowadenoma
3rd rowadenoma
4th rowadenoma
5th rowadenoma

Common Values

ValueCountFrequency (%)
IBD 355
50.2%
healthy 299
42.3%
CRC 27
 
3.8%
adenoma 26
 
3.7%

Length

2025-03-29T21:12:56.660934image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-03-29T21:12:56.707788image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
ibd 355
50.2%
healthy 299
42.3%
crc 27
 
3.8%
adenoma 26
 
3.7%

Most occurring characters

ValueCountFrequency (%)
h 598
17.5%
I 355
10.4%
B 355
10.4%
D 355
10.4%
a 351
10.3%
e 325
9.5%
l 299
8.7%
t 299
8.7%
y 299
8.7%
C 54
 
1.6%
Other values (5) 131
 
3.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3421
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
h 598
17.5%
I 355
10.4%
B 355
10.4%
D 355
10.4%
a 351
10.3%
e 325
9.5%
l 299
8.7%
t 299
8.7%
y 299
8.7%
C 54
 
1.6%
Other values (5) 131
 
3.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3421
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
h 598
17.5%
I 355
10.4%
B 355
10.4%
D 355
10.4%
a 351
10.3%
e 325
9.5%
l 299
8.7%
t 299
8.7%
y 299
8.7%
C 54
 
1.6%
Other values (5) 131
 
3.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3421
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
h 598
17.5%
I 355
10.4%
B 355
10.4%
D 355
10.4%
a 351
10.3%
e 325
9.5%
l 299
8.7%
t 299
8.7%
y 299
8.7%
C 54
 
1.6%
Other values (5) 131
 
3.8%

age
Real number (ℝ)

High correlation  Missing 

Distinct38
Distinct (%)46.9%
Missing626
Missing (%)88.5%
Infinite0
Infinite (%)0.0%
Mean58.580247
Minimum35
Maximum88
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.7 KiB
2025-03-29T21:12:56.773307image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum35
5-th percentile43
Q151
median59
Q365
95-th percentile75
Maximum88
Range53
Interquartile range (IQR)14

Descriptive statistics

Standard deviation10.793359
Coefficient of variation (CV)0.18424913
Kurtosis-0.17597086
Mean58.580247
Median Absolute Deviation (MAD)8
Skewness0.16569532
Sum4745
Variance116.4966
MonotonicityNot monotonic
2025-03-29T21:12:56.852022image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=38)
ValueCountFrequency (%)
51 7
 
1.0%
61 5
 
0.7%
69 4
 
0.6%
63 4
 
0.6%
64 4
 
0.6%
65 4
 
0.6%
58 3
 
0.4%
47 3
 
0.4%
59 3
 
0.4%
52 3
 
0.4%
Other values (28) 41
 
5.8%
(Missing) 626
88.5%
ValueCountFrequency (%)
35 1
 
0.1%
37 2
0.3%
42 1
 
0.1%
43 2
0.3%
44 1
 
0.1%
45 2
0.3%
46 1
 
0.1%
47 3
0.4%
48 1
 
0.1%
49 2
0.3%
ValueCountFrequency (%)
88 1
 
0.1%
82 1
 
0.1%
80 1
 
0.1%
76 1
 
0.1%
75 2
0.3%
73 2
0.3%
72 1
 
0.1%
71 2
0.3%
70 1
 
0.1%
69 4
0.6%

infant_age
Categorical

High correlation  Missing 

Distinct5
Distinct (%)3.2%
Missing552
Missing (%)78.1%
Memory size5.7 KiB
60.0
33 
14.0
32 
30.0
32 
90.0
31 
0.0
27 

Length

Max length4
Median length4
Mean length3.8258065
Min length3

Characters and Unicode

Total characters593
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row60.0
2nd row90.0
3rd row14.0
4th row30.0
5th row60.0

Common Values

ValueCountFrequency (%)
60.0 33
 
4.7%
14.0 32
 
4.5%
30.0 32
 
4.5%
90.0 31
 
4.4%
0.0 27
 
3.8%
(Missing) 552
78.1%

Length

2025-03-29T21:12:56.921390image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-03-29T21:12:56.970915image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
60.0 33
21.3%
14.0 32
20.6%
30.0 32
20.6%
90.0 31
20.0%
0.0 27
17.4%

Most occurring characters

ValueCountFrequency (%)
0 278
46.9%
. 155
26.1%
6 33
 
5.6%
1 32
 
5.4%
4 32
 
5.4%
3 32
 
5.4%
9 31
 
5.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 593
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 278
46.9%
. 155
26.1%
6 33
 
5.6%
1 32
 
5.4%
4 32
 
5.4%
3 32
 
5.4%
9 31
 
5.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 593
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 278
46.9%
. 155
26.1%
6 33
 
5.6%
1 32
 
5.4%
4 32
 
5.4%
3 32
 
5.4%
9 31
 
5.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 593
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 278
46.9%
. 155
26.1%
6 33
 
5.6%
1 32
 
5.4%
4 32
 
5.4%
3 32
 
5.4%
9 31
 
5.2%

age_category
Categorical

High correlation 

Distinct3
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size5.7 KiB
adult
530 
newborn
155 
senior
 
22

Length

Max length7
Median length5
Mean length5.4695898
Min length5

Characters and Unicode

Total characters3867
Distinct characters13
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowadult
2nd rowadult
3rd rowsenior
4th rowsenior
5th rowadult

Common Values

ValueCountFrequency (%)
adult 530
75.0%
newborn 155
 
21.9%
senior 22
 
3.1%

Length

2025-03-29T21:12:57.041748image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-03-29T21:12:57.088673image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
adult 530
75.0%
newborn 155
 
21.9%
senior 22
 
3.1%

Most occurring characters

ValueCountFrequency (%)
a 530
13.7%
d 530
13.7%
u 530
13.7%
l 530
13.7%
t 530
13.7%
n 332
8.6%
e 177
 
4.6%
o 177
 
4.6%
r 177
 
4.6%
w 155
 
4.0%
Other values (3) 199
 
5.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3867
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 530
13.7%
d 530
13.7%
u 530
13.7%
l 530
13.7%
t 530
13.7%
n 332
8.6%
e 177
 
4.6%
o 177
 
4.6%
r 177
 
4.6%
w 155
 
4.0%
Other values (3) 199
 
5.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3867
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 530
13.7%
d 530
13.7%
u 530
13.7%
l 530
13.7%
t 530
13.7%
n 332
8.6%
e 177
 
4.6%
o 177
 
4.6%
r 177
 
4.6%
w 155
 
4.0%
Other values (3) 199
 
5.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3867
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 530
13.7%
d 530
13.7%
u 530
13.7%
l 530
13.7%
t 530
13.7%
n 332
8.6%
e 177
 
4.6%
o 177
 
4.6%
r 177
 
4.6%
w 155
 
4.0%
Other values (3) 199
 
5.1%

gender
Categorical

High correlation 

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.7 KiB
female
464 
male
243 

Length

Max length6
Median length6
Mean length5.3125884
Min length4

Characters and Unicode

Total characters3756
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowfemale
2nd rowmale
3rd rowmale
4th rowfemale
5th rowfemale

Common Values

ValueCountFrequency (%)
female 464
65.6%
male 243
34.4%

Length

2025-03-29T21:12:57.145387image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-03-29T21:12:57.188778image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
female 464
65.6%
male 243
34.4%

Most occurring characters

ValueCountFrequency (%)
e 1171
31.2%
m 707
18.8%
a 707
18.8%
l 707
18.8%
f 464
 
12.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3756
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 1171
31.2%
m 707
18.8%
a 707
18.8%
l 707
18.8%
f 464
 
12.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3756
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 1171
31.2%
m 707
18.8%
a 707
18.8%
l 707
18.8%
f 464
 
12.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3756
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 1171
31.2%
m 707
18.8%
a 707
18.8%
l 707
18.8%
f 464
 
12.4%

country
Categorical

High correlation 

Distinct4
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size5.7 KiB
NLD
355 
FIN
271 
USA
54 
CAN
 
27

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2121
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCAN
2nd rowCAN
3rd rowCAN
4th rowCAN
5th rowCAN

Common Values

ValueCountFrequency (%)
NLD 355
50.2%
FIN 271
38.3%
USA 54
 
7.6%
CAN 27
 
3.8%

Length

2025-03-29T21:12:57.238783image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-03-29T21:12:57.285998image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
nld 355
50.2%
fin 271
38.3%
usa 54
 
7.6%
can 27
 
3.8%

Most occurring characters

ValueCountFrequency (%)
N 653
30.8%
L 355
16.7%
D 355
16.7%
F 271
12.8%
I 271
12.8%
A 81
 
3.8%
U 54
 
2.5%
S 54
 
2.5%
C 27
 
1.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2121
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
N 653
30.8%
L 355
16.7%
D 355
16.7%
F 271
12.8%
I 271
12.8%
A 81
 
3.8%
U 54
 
2.5%
S 54
 
2.5%
C 27
 
1.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2121
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
N 653
30.8%
L 355
16.7%
D 355
16.7%
F 271
12.8%
I 271
12.8%
A 81
 
3.8%
U 54
 
2.5%
S 54
 
2.5%
C 27
 
1.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2121
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
N 653
30.8%
L 355
16.7%
D 355
16.7%
F 271
12.8%
I 271
12.8%
A 81
 
3.8%
U 54
 
2.5%
S 54
 
2.5%
C 27
 
1.3%

non_westernized
Boolean

Constant 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size839.0 B
False
707 
ValueCountFrequency (%)
False 707
100.0%
2025-03-29T21:12:57.319769image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

sequencing_platform
Categorical

Constant 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size5.7 KiB
IlluminaHiSeq
707 

Length

Max length13
Median length13
Mean length13
Min length13

Characters and Unicode

Total characters9191
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowIlluminaHiSeq
2nd rowIlluminaHiSeq
3rd rowIlluminaHiSeq
4th rowIlluminaHiSeq
5th rowIlluminaHiSeq

Common Values

ValueCountFrequency (%)
IlluminaHiSeq 707
100.0%

Length

2025-03-29T21:12:57.364223image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-03-29T21:12:57.402237image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
illuminahiseq 707
100.0%

Most occurring characters

ValueCountFrequency (%)
l 1414
15.4%
i 1414
15.4%
I 707
7.7%
u 707
7.7%
m 707
7.7%
n 707
7.7%
a 707
7.7%
H 707
7.7%
S 707
7.7%
e 707
7.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 9191
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
l 1414
15.4%
i 1414
15.4%
I 707
7.7%
u 707
7.7%
m 707
7.7%
n 707
7.7%
a 707
7.7%
H 707
7.7%
S 707
7.7%
e 707
7.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 9191
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
l 1414
15.4%
i 1414
15.4%
I 707
7.7%
u 707
7.7%
m 707
7.7%
n 707
7.7%
a 707
7.7%
H 707
7.7%
S 707
7.7%
e 707
7.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 9191
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
l 1414
15.4%
i 1414
15.4%
I 707
7.7%
u 707
7.7%
m 707
7.7%
n 707
7.7%
a 707
7.7%
H 707
7.7%
S 707
7.7%
e 707
7.7%

DNA_extraction_kit
Categorical

High correlation 

Distinct3
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size5.7 KiB
Qiagen
355 
PowerSoil
271 
MoBio
81 

Length

Max length9
Median length6
Mean length7.0353607
Min length5

Characters and Unicode

Total characters4974
Distinct characters14
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMoBio
2nd rowMoBio
3rd rowMoBio
4th rowMoBio
5th rowMoBio

Common Values

ValueCountFrequency (%)
Qiagen 355
50.2%
PowerSoil 271
38.3%
MoBio 81
 
11.5%

Length

2025-03-29T21:12:57.451544image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-03-29T21:12:57.497119image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
qiagen 355
50.2%
powersoil 271
38.3%
mobio 81
 
11.5%

Most occurring characters

ValueCountFrequency (%)
i 707
14.2%
o 704
14.2%
e 626
12.6%
Q 355
7.1%
a 355
7.1%
g 355
7.1%
n 355
7.1%
P 271
 
5.4%
w 271
 
5.4%
r 271
 
5.4%
Other values (4) 704
14.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 4974
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
i 707
14.2%
o 704
14.2%
e 626
12.6%
Q 355
7.1%
a 355
7.1%
g 355
7.1%
n 355
7.1%
P 271
 
5.4%
w 271
 
5.4%
r 271
 
5.4%
Other values (4) 704
14.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 4974
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
i 707
14.2%
o 704
14.2%
e 626
12.6%
Q 355
7.1%
a 355
7.1%
g 355
7.1%
n 355
7.1%
P 271
 
5.4%
w 271
 
5.4%
r 271
 
5.4%
Other values (4) 704
14.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 4974
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
i 707
14.2%
o 704
14.2%
e 626
12.6%
Q 355
7.1%
a 355
7.1%
g 355
7.1%
n 355
7.1%
P 271
 
5.4%
w 271
 
5.4%
r 271
 
5.4%
Other values (4) 704
14.2%

PMID
Categorical

High correlation 

Distinct3
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size5.7 KiB
30567928
355 
30001517
271 
30459201
81 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters5656
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row30459201
2nd row30459201
3rd row30459201
4th row30459201
5th row30459201

Common Values

ValueCountFrequency (%)
30567928 355
50.2%
30001517 271
38.3%
30459201 81
 
11.5%

Length

2025-03-29T21:12:57.554849image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-03-29T21:12:57.598117image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
30567928 355
50.2%
30001517 271
38.3%
30459201 81
 
11.5%

Most occurring characters

ValueCountFrequency (%)
0 1330
23.5%
3 707
12.5%
5 707
12.5%
7 626
11.1%
1 623
11.0%
9 436
 
7.7%
2 436
 
7.7%
6 355
 
6.3%
8 355
 
6.3%
4 81
 
1.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 5656
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 1330
23.5%
3 707
12.5%
5 707
12.5%
7 626
11.1%
1 623
11.0%
9 436
 
7.7%
2 436
 
7.7%
6 355
 
6.3%
8 355
 
6.3%
4 81
 
1.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 5656
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 1330
23.5%
3 707
12.5%
5 707
12.5%
7 626
11.1%
1 623
11.0%
9 436
 
7.7%
2 436
 
7.7%
6 355
 
6.3%
8 355
 
6.3%
4 81
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 5656
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 1330
23.5%
3 707
12.5%
5 707
12.5%
7 626
11.1%
1 623
11.0%
9 436
 
7.7%
2 436
 
7.7%
6 355
 
6.3%
8 355
 
6.3%
4 81
 
1.4%

number_reads
Real number (ℝ)

High correlation  Unique 

Distinct707
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean23644491
Minimum17146
Maximum61282548
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.7 KiB
2025-03-29T21:12:57.662663image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum17146
5-th percentile3704368
Q114631833
median22911696
Q333073995
95-th percentile43157820
Maximum61282548
Range61265402
Interquartile range (IQR)18442162

Descriptive statistics

Standard deviation12319299
Coefficient of variation (CV)0.52102195
Kurtosis-0.54290552
Mean23644491
Median Absolute Deviation (MAD)9071348
Skewness0.15906303
Sum1.6716655 × 1010
Variance1.5176513 × 1014
MonotonicityNot monotonic
2025-03-29T21:12:57.748862image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6133350 1
 
0.1%
24298490 1
 
0.1%
32354982 1
 
0.1%
40224828 1
 
0.1%
35364602 1
 
0.1%
26496 1
 
0.1%
34216240 1
 
0.1%
32741126 1
 
0.1%
28962234 1
 
0.1%
22129382 1
 
0.1%
Other values (697) 697
98.6%
ValueCountFrequency (%)
17146 1
0.1%
26496 1
0.1%
52356 1
0.1%
69510 1
0.1%
125344 1
0.1%
194214 1
0.1%
198646 1
0.1%
330750 1
0.1%
348544 1
0.1%
611596 1
0.1%
ValueCountFrequency (%)
61282548 1
0.1%
57831136 1
0.1%
57570980 1
0.1%
54594488 1
0.1%
53033258 1
0.1%
52420562 1
0.1%
51078896 1
0.1%
50441276 1
0.1%
50113134 1
0.1%
49914668 1
0.1%

number_bases
Real number (ℝ)

High correlation  Unique 

Distinct707
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.3412048 × 109
Minimum2086283
Maximum6.0902587 × 109
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.7 KiB
2025-03-29T21:12:57.829513image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum2086283
5-th percentile4.5041067 × 108
Q11.4536673 × 109
median2.2335913 × 109
Q33.247629 × 109
95-th percentile4.2638767 × 109
Maximum6.0902587 × 109
Range6.0881724 × 109
Interquartile range (IQR)1.7939617 × 109

Descriptive statistics

Standard deviation1.1995216 × 109
Coefficient of variation (CV)0.51235226
Kurtosis-0.4914903
Mean2.3412048 × 109
Median Absolute Deviation (MAD)8.7806468 × 108
Skewness0.22538048
Sum1.6552318 × 1012
Variance1.438852 × 1018
MonotonicityNot monotonic
2025-03-29T21:12:57.915163image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
763336051 1
 
0.1%
2399181212 1
 
0.1%
3213885342 1
 
0.1%
3999590449 1
 
0.1%
3476933937 1
 
0.1%
2443992 1
 
0.1%
3362292000 1
 
0.1%
3231276322 1
 
0.1%
2855311495 1
 
0.1%
2189611119 1
 
0.1%
Other values (697) 697
98.6%
ValueCountFrequency (%)
2086283 1
0.1%
2443992 1
0.1%
6483099 1
0.1%
6625974 1
0.1%
12254835 1
0.1%
19193033 1
0.1%
19211612 1
0.1%
34675905 1
0.1%
41203287 1
0.1%
59380464 1
0.1%
ValueCountFrequency (%)
6090258687 1
0.1%
5748426209 1
0.1%
5712739338 1
0.1%
5417298173 1
0.1%
5281156418 1
0.1%
5076562618 1
0.1%
5074055134 1
0.1%
5010104788 1
0.1%
4992535615 1
0.1%
4961648630 1
0.1%

minimum_read_length
Real number (ℝ)

High correlation 

Distinct13
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean59.762376
Minimum50
Maximum80
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.7 KiB
2025-03-29T21:12:57.975954image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum50
5-th percentile50
Q157
median60
Q360
95-th percentile75
Maximum80
Range30
Interquartile range (IQR)3

Descriptive statistics

Standard deviation7.2497025
Coefficient of variation (CV)0.12130881
Kurtosis0.43237454
Mean59.762376
Median Absolute Deviation (MAD)0
Skewness0.78733893
Sum42252
Variance52.558186
MonotonicityNot monotonic
2025-03-29T21:12:58.029394image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
60 398
56.3%
50 85
 
12.0%
75 81
 
11.5%
51 77
 
10.9%
57 22
 
3.1%
64 15
 
2.1%
63 10
 
1.4%
52 7
 
1.0%
78 5
 
0.7%
71 3
 
0.4%
Other values (3) 4
 
0.6%
ValueCountFrequency (%)
50 85
 
12.0%
51 77
 
10.9%
52 7
 
1.0%
57 22
 
3.1%
60 398
56.3%
63 10
 
1.4%
64 15
 
2.1%
71 3
 
0.4%
73 1
 
0.1%
75 81
 
11.5%
ValueCountFrequency (%)
80 2
 
0.3%
78 5
 
0.7%
76 1
 
0.1%
75 81
 
11.5%
73 1
 
0.1%
71 3
 
0.4%
64 15
 
2.1%
63 10
 
1.4%
60 398
56.3%
57 22
 
3.1%

median_read_length
Categorical

High correlation  Imbalance 

Distinct5
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size5.7 KiB
101
619 
126
79 
100
 
6
125
 
2
95
 
1

Length

Max length3
Median length3
Mean length2.9985856
Min length2

Characters and Unicode

Total characters2120
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row126
2nd row126
3rd row126
4th row126
5th row126

Common Values

ValueCountFrequency (%)
101 619
87.6%
126 79
 
11.2%
100 6
 
0.8%
125 2
 
0.3%
95 1
 
0.1%

Length

2025-03-29T21:12:58.093849image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-03-29T21:12:58.140733image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
101 619
87.6%
126 79
 
11.2%
100 6
 
0.8%
125 2
 
0.3%
95 1
 
0.1%

Most occurring characters

ValueCountFrequency (%)
1 1325
62.5%
0 631
29.8%
2 81
 
3.8%
6 79
 
3.7%
5 3
 
0.1%
9 1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2120
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1 1325
62.5%
0 631
29.8%
2 81
 
3.8%
6 79
 
3.7%
5 3
 
0.1%
9 1
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2120
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1 1325
62.5%
0 631
29.8%
2 81
 
3.8%
6 79
 
3.7%
5 3
 
0.1%
9 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2120
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1 1325
62.5%
0 631
29.8%
2 81
 
3.8%
6 79
 
3.7%
5 3
 
0.1%
9 1
 
< 0.1%

NCBI_accession
Text

Missing 

Distinct352
Distinct (%)100.0%
Missing355
Missing (%)50.2%
Memory size5.7 KiB
2025-03-29T21:12:58.317538image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters3520
Distinct characters12
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique352 ?
Unique (%)100.0%

Sample

1st rowSRR5665080
2nd rowSRR5665075
3rd rowSRR5665074
4th rowSRR5665073
5th rowSRR5665072
ValueCountFrequency (%)
srr5665080 1
 
0.3%
srr7280919 1
 
0.3%
srr5665074 1
 
0.3%
srr5665073 1
 
0.3%
srr5665072 1
 
0.3%
srr5665079 1
 
0.3%
srr5665078 1
 
0.3%
srr5665077 1
 
0.3%
srr5665076 1
 
0.3%
srr5665134 1
 
0.3%
Other values (342) 342
97.2%
2025-03-29T21:12:58.557908image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
R 704
20.0%
8 427
12.1%
0 374
10.6%
S 352
10.0%
7 351
10.0%
2 344
9.8%
5 244
 
6.9%
6 233
 
6.6%
1 180
 
5.1%
9 158
 
4.5%
Other values (2) 153
 
4.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3520
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
R 704
20.0%
8 427
12.1%
0 374
10.6%
S 352
10.0%
7 351
10.0%
2 344
9.8%
5 244
 
6.9%
6 233
 
6.6%
1 180
 
5.1%
9 158
 
4.5%
Other values (2) 153
 
4.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3520
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
R 704
20.0%
8 427
12.1%
0 374
10.6%
S 352
10.0%
7 351
10.0%
2 344
9.8%
5 244
 
6.9%
6 233
 
6.6%
1 180
 
5.1%
9 158
 
4.5%
Other values (2) 153
 
4.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3520
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
R 704
20.0%
8 427
12.1%
0 374
10.6%
S 352
10.0%
7 351
10.0%
2 344
9.8%
5 244
 
6.9%
6 233
 
6.6%
1 180
 
5.1%
9 158
 
4.5%
Other values (2) 153
 
4.3%

pregnant
Boolean

High correlation  Missing 

Distinct2
Distinct (%)0.7%
Missing436
Missing (%)61.7%
Memory size1.5 KiB
False
229 
True
 
42
(Missing)
436 
ValueCountFrequency (%)
False 229
32.4%
True 42
 
5.9%
(Missing) 436
61.7%
2025-03-29T21:12:58.601264image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

lactating
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

curator
Categorical

High correlation 

Distinct3
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size5.7 KiB
Ilya_Likhotkin;Paolo_Manghi
355 
Marisa_Metzger
271 
Paolo_Manghi
81 

Length

Max length27
Median length27
Mean length20.298444
Min length12

Characters and Unicode

Total characters14351
Distinct characters20
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPaolo_Manghi
2nd rowPaolo_Manghi
3rd rowPaolo_Manghi
4th rowPaolo_Manghi
5th rowPaolo_Manghi

Common Values

ValueCountFrequency (%)
Ilya_Likhotkin;Paolo_Manghi 355
50.2%
Marisa_Metzger 271
38.3%
Paolo_Manghi 81
 
11.5%

Length

2025-03-29T21:12:58.658067image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-03-29T21:12:58.705175image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
ilya_likhotkin;paolo_manghi 355
50.2%
marisa_metzger 271
38.3%
paolo_manghi 81
 
11.5%

Most occurring characters

ValueCountFrequency (%)
a 1769
12.3%
i 1417
 
9.9%
o 1227
 
8.5%
_ 1062
 
7.4%
M 978
 
6.8%
h 791
 
5.5%
l 791
 
5.5%
n 791
 
5.5%
k 710
 
4.9%
g 707
 
4.9%
Other values (10) 4108
28.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 14351
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 1769
12.3%
i 1417
 
9.9%
o 1227
 
8.5%
_ 1062
 
7.4%
M 978
 
6.8%
h 791
 
5.5%
l 791
 
5.5%
n 791
 
5.5%
k 710
 
4.9%
g 707
 
4.9%
Other values (10) 4108
28.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 14351
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 1769
12.3%
i 1417
 
9.9%
o 1227
 
8.5%
_ 1062
 
7.4%
M 978
 
6.8%
h 791
 
5.5%
l 791
 
5.5%
n 791
 
5.5%
k 710
 
4.9%
g 707
 
4.9%
Other values (10) 4108
28.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 14351
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 1769
12.3%
i 1417
 
9.9%
o 1227
 
8.5%
_ 1062
 
7.4%
M 978
 
6.8%
h 791
 
5.5%
l 791
 
5.5%
n 791
 
5.5%
k 710
 
4.9%
g 707
 
4.9%
Other values (10) 4108
28.6%

BMI
Real number (ℝ)

High correlation  Missing 

Distinct75
Distinct (%)93.8%
Missing627
Missing (%)88.7%
Infinite0
Infinite (%)0.0%
Mean28.05552
Minimum18.662015
Maximum57.463494
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.7 KiB
2025-03-29T21:12:58.773180image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum18.662015
5-th percentile21.09295
Q123.863606
median27.019632
Q331.606655
95-th percentile35.89538
Maximum57.463494
Range38.801479
Interquartile range (IQR)7.7430497

Descriptive statistics

Standard deviation6.1195984
Coefficient of variation (CV)0.21812457
Kurtosis6.1087119
Mean28.05552
Median Absolute Deviation (MAD)3.4462189
Skewness1.7809348
Sum2244.4416
Variance37.449485
MonotonicityNot monotonic
2025-03-29T21:12:58.862389image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
21.33821064 3
 
0.4%
35.49786654 2
 
0.3%
21.1552942 2
 
0.3%
27.82931354 2
 
0.3%
21.51385851 1
 
0.1%
46.36734694 1
 
0.1%
29.75206612 1
 
0.1%
21.62064772 1
 
0.1%
23.082542 1
 
0.1%
34.47772096 1
 
0.1%
Other values (65) 65
 
9.2%
(Missing) 627
88.7%
ValueCountFrequency (%)
18.66201469 1
 
0.1%
20.56932966 1
 
0.1%
20.74755019 1
 
0.1%
20.82093992 1
 
0.1%
21.10726644 1
 
0.1%
21.1552942 2
0.3%
21.33821064 3
0.4%
21.51385851 1
 
0.1%
21.60493827 1
 
0.1%
21.62064772 1
 
0.1%
ValueCountFrequency (%)
57.46349378 1
0.1%
46.36734694 1
0.1%
39.50617284 1
0.1%
37.78272346 1
0.1%
35.79604579 1
0.1%
35.49786654 2
0.3%
35.11123879 1
0.1%
34.90026578 1
0.1%
34.47772096 1
0.1%
33.56401384 1
0.1%

family
Text

Missing 

Distinct80
Distinct (%)29.5%
Missing436
Missing (%)61.7%
Memory size5.7 KiB
2025-03-29T21:12:59.012903image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length20
Median length20
Mean length20
Min length20

Characters and Unicode

Total characters5420
Distinct characters19
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)1.5%

Sample

1st rowYassourM_2018_M0018C
2nd rowYassourM_2018_M0024M
3rd rowYassourM_2018_M0402C
4th rowYassourM_2018_M0402M
5th rowYassourM_2018_M0402M
ValueCountFrequency (%)
yassourm_2018_m0059c 5
 
1.8%
yassourm_2018_m0259c 5
 
1.8%
yassourm_2018_m1098c 5
 
1.8%
yassourm_2018_m0297c 5
 
1.8%
yassourm_2018_m0487c 5
 
1.8%
yassourm_2018_m0261c 5
 
1.8%
yassourm_2018_m0450c 5
 
1.8%
yassourm_2018_m0038c 5
 
1.8%
yassourm_2018_m0084c 5
 
1.8%
yassourm_2018_m0399c 5
 
1.8%
Other values (70) 221
81.5%
2025-03-29T21:12:59.213525image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 658
12.1%
M 658
12.1%
s 542
10.0%
_ 542
10.0%
2 374
 
6.9%
8 352
 
6.5%
1 340
 
6.3%
a 271
 
5.0%
Y 271
 
5.0%
r 271
 
5.0%
Other values (9) 1141
21.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 5420
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 658
12.1%
M 658
12.1%
s 542
10.0%
_ 542
10.0%
2 374
 
6.9%
8 352
 
6.5%
1 340
 
6.3%
a 271
 
5.0%
Y 271
 
5.0%
r 271
 
5.0%
Other values (9) 1141
21.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 5420
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 658
12.1%
M 658
12.1%
s 542
10.0%
_ 542
10.0%
2 374
 
6.9%
8 352
 
6.5%
1 340
 
6.3%
a 271
 
5.0%
Y 271
 
5.0%
r 271
 
5.0%
Other values (9) 1141
21.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 5420
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 658
12.1%
M 658
12.1%
s 542
10.0%
_ 542
10.0%
2 374
 
6.9%
8 352
 
6.5%
1 340
 
6.3%
a 271
 
5.0%
Y 271
 
5.0%
r 271
 
5.0%
Other values (9) 1141
21.1%

treatment
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

days_from_first_collection
Real number (ℝ)

High correlation  Missing  Zeros 

Distinct7
Distinct (%)2.6%
Missing436
Missing (%)61.7%
Infinite0
Infinite (%)0.0%
Mean54.118081
Minimum0
Maximum167
Zeros69
Zeros (%)9.8%
Negative0
Negative (%)0.0%
Memory size5.7 KiB
2025-03-29T21:12:59.262823image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median60
Q377
95-th percentile167
Maximum167
Range167
Interquartile range (IQR)77

Descriptive statistics

Standard deviation52.025002
Coefficient of variation (CV)0.96132384
Kurtosis-0.044597768
Mean54.118081
Median Absolute Deviation (MAD)30
Skewness0.87688467
Sum14666
Variance2706.6008
MonotonicityNot monotonic
2025-03-29T21:12:59.313897image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 69
 
9.8%
77 43
 
6.1%
60 33
 
4.7%
14 32
 
4.5%
30 32
 
4.5%
90 31
 
4.4%
167 31
 
4.4%
(Missing) 436
61.7%
ValueCountFrequency (%)
0 69
9.8%
14 32
4.5%
30 32
4.5%
60 33
4.7%
77 43
6.1%
90 31
4.4%
167 31
4.4%
ValueCountFrequency (%)
167 31
4.4%
90 31
4.4%
77 43
6.1%
60 33
4.7%
30 32
4.5%
14 32
4.5%
0 69
9.8%

family_role
Categorical

High correlation  Missing 

Distinct2
Distinct (%)0.7%
Missing436
Missing (%)61.7%
Memory size5.7 KiB
child
155 
mother
116 

Length

Max length6
Median length5
Mean length5.4280443
Min length5

Characters and Unicode

Total characters1471
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowchild
2nd rowmother
3rd rowchild
4th rowmother
5th rowmother

Common Values

ValueCountFrequency (%)
child 155
 
21.9%
mother 116
 
16.4%
(Missing) 436
61.7%

Length

2025-03-29T21:12:59.377606image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-03-29T21:12:59.418664image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
child 155
57.2%
mother 116
42.8%

Most occurring characters

ValueCountFrequency (%)
h 271
18.4%
c 155
10.5%
i 155
10.5%
l 155
10.5%
d 155
10.5%
m 116
7.9%
o 116
7.9%
t 116
7.9%
e 116
7.9%
r 116
7.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1471
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
h 271
18.4%
c 155
10.5%
i 155
10.5%
l 155
10.5%
d 155
10.5%
m 116
7.9%
o 116
7.9%
t 116
7.9%
e 116
7.9%
r 116
7.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1471
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
h 271
18.4%
c 155
10.5%
i 155
10.5%
l 155
10.5%
d 155
10.5%
m 116
7.9%
o 116
7.9%
t 116
7.9%
e 116
7.9%
r 116
7.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1471
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
h 271
18.4%
c 155
10.5%
i 155
10.5%
l 155
10.5%
d 155
10.5%
m 116
7.9%
o 116
7.9%
t 116
7.9%
e 116
7.9%
r 116
7.9%

born_method
Categorical

High correlation  Missing 

Distinct2
Distinct (%)1.3%
Missing552
Missing (%)78.1%
Memory size5.7 KiB
vaginal
133 
c_section
22 

Length

Max length9
Median length7
Mean length7.283871
Min length7

Characters and Unicode

Total characters1129
Distinct characters12
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowvaginal
2nd rowvaginal
3rd rowvaginal
4th rowvaginal
5th rowvaginal

Common Values

ValueCountFrequency (%)
vaginal 133
 
18.8%
c_section 22
 
3.1%
(Missing) 552
78.1%

Length

2025-03-29T21:12:59.474586image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-03-29T21:12:59.519182image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
vaginal 133
85.8%
c_section 22
 
14.2%

Most occurring characters

ValueCountFrequency (%)
a 266
23.6%
i 155
13.7%
n 155
13.7%
v 133
11.8%
g 133
11.8%
l 133
11.8%
c 44
 
3.9%
_ 22
 
1.9%
s 22
 
1.9%
e 22
 
1.9%
Other values (2) 44
 
3.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1129
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 266
23.6%
i 155
13.7%
n 155
13.7%
v 133
11.8%
g 133
11.8%
l 133
11.8%
c 44
 
3.9%
_ 22
 
1.9%
s 22
 
1.9%
e 22
 
1.9%
Other values (2) 44
 
3.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1129
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 266
23.6%
i 155
13.7%
n 155
13.7%
v 133
11.8%
g 133
11.8%
l 133
11.8%
c 44
 
3.9%
_ 22
 
1.9%
s 22
 
1.9%
e 22
 
1.9%
Other values (2) 44
 
3.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1129
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 266
23.6%
i 155
13.7%
n 155
13.7%
v 133
11.8%
g 133
11.8%
l 133
11.8%
c 44
 
3.9%
_ 22
 
1.9%
s 22
 
1.9%
e 22
 
1.9%
Other values (2) 44
 
3.9%

feeding_practice
Categorical

High correlation  Missing 

Distinct2
Distinct (%)1.3%
Missing556
Missing (%)78.6%
Memory size5.7 KiB
exclusively_breastfeeding
80 
mixed_feeding
71 

Length

Max length25
Median length25
Mean length19.357616
Min length13

Characters and Unicode

Total characters2923
Distinct characters19
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowexclusively_breastfeeding
2nd rowmixed_feeding
3rd rowmixed_feeding
4th rowmixed_feeding
5th rowmixed_feeding

Common Values

ValueCountFrequency (%)
exclusively_breastfeeding 80
 
11.3%
mixed_feeding 71
 
10.0%
(Missing) 556
78.6%

Length

2025-03-29T21:12:59.566773image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-03-29T21:12:59.604098image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
exclusively_breastfeeding 80
53.0%
mixed_feeding 71
47.0%

Most occurring characters

ValueCountFrequency (%)
e 613
21.0%
i 302
10.3%
d 222
 
7.6%
l 160
 
5.5%
s 160
 
5.5%
x 151
 
5.2%
g 151
 
5.2%
n 151
 
5.2%
f 151
 
5.2%
_ 151
 
5.2%
Other values (9) 711
24.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2923
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 613
21.0%
i 302
10.3%
d 222
 
7.6%
l 160
 
5.5%
s 160
 
5.5%
x 151
 
5.2%
g 151
 
5.2%
n 151
 
5.2%
f 151
 
5.2%
_ 151
 
5.2%
Other values (9) 711
24.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2923
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 613
21.0%
i 302
10.3%
d 222
 
7.6%
l 160
 
5.5%
s 160
 
5.5%
x 151
 
5.2%
g 151
 
5.2%
n 151
 
5.2%
f 151
 
5.2%
_ 151
 
5.2%
Other values (9) 711
24.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2923
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 613
21.0%
i 302
10.3%
d 222
 
7.6%
l 160
 
5.5%
s 160
 
5.5%
x 151
 
5.2%
g 151
 
5.2%
n 151
 
5.2%
f 151
 
5.2%
_ 151
 
5.2%
Other values (9) 711
24.3%

location
Categorical

High correlation  Missing 

Distinct4
Distinct (%)4.9%
Missing626
Missing (%)88.5%
Memory size5.7 KiB
Toronto
27 
Houston
27 
AnnArbor
15 
Boston
12 

Length

Max length8
Median length7
Mean length7.037037
Min length6

Characters and Unicode

Total characters570
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowToronto
2nd rowToronto
3rd rowToronto
4th rowToronto
5th rowToronto

Common Values

ValueCountFrequency (%)
Toronto 27
 
3.8%
Houston 27
 
3.8%
AnnArbor 15
 
2.1%
Boston 12
 
1.7%
(Missing) 626
88.5%

Length

2025-03-29T21:12:59.662995image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-03-29T21:12:59.712379image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
toronto 27
33.3%
houston 27
33.3%
annarbor 15
18.5%
boston 12
14.8%

Most occurring characters

ValueCountFrequency (%)
o 174
30.5%
n 96
16.8%
t 66
 
11.6%
r 57
 
10.0%
s 39
 
6.8%
A 30
 
5.3%
T 27
 
4.7%
H 27
 
4.7%
u 27
 
4.7%
b 15
 
2.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 570
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 174
30.5%
n 96
16.8%
t 66
 
11.6%
r 57
 
10.0%
s 39
 
6.8%
A 30
 
5.3%
T 27
 
4.7%
H 27
 
4.7%
u 27
 
4.7%
b 15
 
2.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 570
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 174
30.5%
n 96
16.8%
t 66
 
11.6%
r 57
 
10.0%
s 39
 
6.8%
A 30
 
5.3%
T 27
 
4.7%
H 27
 
4.7%
u 27
 
4.7%
b 15
 
2.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 570
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 174
30.5%
n 96
16.8%
t 66
 
11.6%
r 57
 
10.0%
s 39
 
6.8%
A 30
 
5.3%
T 27
 
4.7%
H 27
 
4.7%
u 27
 
4.7%
b 15
 
2.6%

diet
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

travel_destination
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

visit_number
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

premature
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

birth_weight
Real number (ℝ)

High correlation  Missing 

Distinct35
Distinct (%)22.6%
Missing552
Missing (%)78.1%
Infinite0
Infinite (%)0.0%
Mean3474.1935
Minimum2650
Maximum4330
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.7 KiB
2025-03-29T21:12:59.774568image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum2650
5-th percentile2913
Q13205
median3505
Q33660
95-th percentile4005
Maximum4330
Range1680
Interquartile range (IQR)455

Descriptive statistics

Standard deviation355.53667
Coefficient of variation (CV)0.10233646
Kurtosis0.10306626
Mean3474.1935
Median Absolute Deviation (MAD)235
Skewness0.0039501813
Sum538500
Variance126406.33
MonotonicityNot monotonic
2025-03-29T21:12:59.843525image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=35)
ValueCountFrequency (%)
3090 10
 
1.4%
4005 9
 
1.3%
3270 5
 
0.7%
3385 5
 
0.7%
3685 5
 
0.7%
3570 5
 
0.7%
3175 5
 
0.7%
3210 5
 
0.7%
3205 5
 
0.7%
3640 5
 
0.7%
Other values (25) 96
 
13.6%
(Missing) 552
78.1%
ValueCountFrequency (%)
2650 4
 
0.6%
2780 4
 
0.6%
2970 5
0.7%
3090 10
1.4%
3095 4
 
0.6%
3120 1
 
0.1%
3145 1
 
0.1%
3150 4
 
0.6%
3175 5
0.7%
3205 5
0.7%
ValueCountFrequency (%)
4330 1
 
0.1%
4310 5
0.7%
4005 9
1.3%
3900 1
 
0.1%
3870 4
0.6%
3775 4
0.6%
3740 5
0.7%
3685 5
0.7%
3670 5
0.7%
3650 5
0.7%

gestational_age
Real number (ℝ)

High correlation  Missing 

Distinct21
Distinct (%)13.5%
Missing552
Missing (%)78.1%
Infinite0
Infinite (%)0.0%
Mean39.981935
Minimum36.6
Maximum42.4
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.7 KiB
2025-03-29T21:12:59.905958image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum36.6
5-th percentile38
Q139
median40.1
Q340.7
95-th percentile42.1
Maximum42.4
Range5.8
Interquartile range (IQR)1.7

Descriptive statistics

Standard deviation1.2399451
Coefficient of variation (CV)0.031012632
Kurtosis0.084003382
Mean39.981935
Median Absolute Deviation (MAD)0.8
Skewness-0.37036001
Sum6197.2
Variance1.5374638
MonotonicityNot monotonic
2025-03-29T21:12:59.966408image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%)
40.7 25
 
3.5%
39 13
 
1.8%
40.3 12
 
1.7%
40.9 10
 
1.4%
38 10
 
1.4%
40.1 9
 
1.3%
40.6 9
 
1.3%
40 6
 
0.8%
41.4 6
 
0.8%
41.7 5
 
0.7%
Other values (11) 50
 
7.1%
(Missing) 552
78.1%
ValueCountFrequency (%)
36.6 4
 
0.6%
38 10
1.4%
38.4 5
 
0.7%
38.7 5
 
0.7%
38.9 5
 
0.7%
39 13
1.8%
39.1 5
 
0.7%
39.3 5
 
0.7%
39.4 4
 
0.6%
39.6 3
 
0.4%
ValueCountFrequency (%)
42.4 5
 
0.7%
42.1 5
 
0.7%
41.7 5
 
0.7%
41.4 6
 
0.8%
40.9 10
 
1.4%
40.7 25
3.5%
40.6 9
 
1.3%
40.3 12
1.7%
40.1 9
 
1.3%
40 6
 
0.8%

antibiotics_family
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

disease_subtype
Categorical

High correlation  Missing 

Distinct3
Distinct (%)0.8%
Missing352
Missing (%)49.8%
Memory size5.7 KiB
CD
216 
UC
119 
undetermined_colitis
 
20

Length

Max length20
Median length2
Mean length3.0140845
Min length2

Characters and Unicode

Total characters1070
Distinct characters16
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUC
2nd rowUC
3rd rowUC
4th rowCD
5th rowCD

Common Values

ValueCountFrequency (%)
CD 216
30.6%
UC 119
 
16.8%
undetermined_colitis 20
 
2.8%
(Missing) 352
49.8%

Length

2025-03-29T21:13:00.037780image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-03-29T21:13:00.079364image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
cd 216
60.8%
uc 119
33.5%
undetermined_colitis 20
 
5.6%

Most occurring characters

ValueCountFrequency (%)
C 335
31.3%
D 216
20.2%
U 119
 
11.1%
e 60
 
5.6%
i 60
 
5.6%
n 40
 
3.7%
d 40
 
3.7%
t 40
 
3.7%
u 20
 
1.9%
r 20
 
1.9%
Other values (6) 120
 
11.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1070
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
C 335
31.3%
D 216
20.2%
U 119
 
11.1%
e 60
 
5.6%
i 60
 
5.6%
n 40
 
3.7%
d 40
 
3.7%
t 40
 
3.7%
u 20
 
1.9%
r 20
 
1.9%
Other values (6) 120
 
11.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1070
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
C 335
31.3%
D 216
20.2%
U 119
 
11.1%
e 60
 
5.6%
i 60
 
5.6%
n 40
 
3.7%
d 40
 
3.7%
t 40
 
3.7%
u 20
 
1.9%
r 20
 
1.9%
Other values (6) 120
 
11.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1070
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
C 335
31.3%
D 216
20.2%
U 119
 
11.1%
e 60
 
5.6%
i 60
 
5.6%
n 40
 
3.7%
d 40
 
3.7%
t 40
 
3.7%
u 20
 
1.9%
r 20
 
1.9%
Other values (6) 120
 
11.2%

days_after_onset
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

creatine
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

albumine
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

hscrp
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

ESR
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

ast
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

alt
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

globulin
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

urea_nitrogen
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

BASDAI
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

BASFI
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

alcohol
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

flg_genotype
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

population
Categorical

Constant  Missing 

Distinct1
Distinct (%)0.3%
Missing352
Missing (%)49.8%
Memory size5.7 KiB
Dutch
355 

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters1775
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDutch
2nd rowDutch
3rd rowDutch
4th rowDutch
5th rowDutch

Common Values

ValueCountFrequency (%)
Dutch 355
50.2%
(Missing) 352
49.8%

Length

2025-03-29T21:13:00.136530image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-03-29T21:13:00.172131image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
dutch 355
100.0%

Most occurring characters

ValueCountFrequency (%)
D 355
20.0%
u 355
20.0%
t 355
20.0%
c 355
20.0%
h 355
20.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1775
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
D 355
20.0%
u 355
20.0%
t 355
20.0%
c 355
20.0%
h 355
20.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1775
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
D 355
20.0%
u 355
20.0%
t 355
20.0%
c 355
20.0%
h 355
20.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1775
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
D 355
20.0%
u 355
20.0%
t 355
20.0%
c 355
20.0%
h 355
20.0%

menopausal_status
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

lifestyle
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

body_subsite
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

uncurated_metadata
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

tnm
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

triglycerides
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

hdl
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

ldl
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

hba1c
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

change_in_tumor_size
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

RECIST
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

ORR
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

smoker
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

ever_smoker
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

dental_sample_type
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

history_of_periodontitis
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

PPD_M
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

PPD_B
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

PPD_D
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

PPD_L
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

fobt
Boolean

High correlation  Missing 

Distinct2
Distinct (%)2.5%
Missing626
Missing (%)88.5%
Memory size1.5 KiB
False
67 
True
 
14
(Missing)
626 
ValueCountFrequency (%)
False 67
 
9.5%
True 14
 
2.0%
(Missing) 626
88.5%
2025-03-29T21:13:00.196881image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

disease_stage
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

disease_location
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

calprotectin
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

HBI
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

SCCAI
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

mumps
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

cholesterol
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

c_peptide
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

glucose
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

creatinine
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

bilubirin
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

prothrombin_time
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

wbc
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

rbc
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

hemoglobinometry
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

FMT_role
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

subcohort
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

fmt_id
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

remission
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

dyastolic_p
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

systolic_p
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

insulin_cat
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

adiponectin
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

glp_1
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

cd163
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

il_1
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

leptin
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

fgf_19
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

glutamate_decarboxylase_2_antibody
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

HLA
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

autoantibody_positive
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

age_seroconversion
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

age_T1D_diagnosis
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

hitchip_probe_class
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

previous_therapy
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

performance_status
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

toxicity_above_zero
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

PFS12
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

fasting_insulin
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

fasting_glucose
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

protein_intake
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

stec_count
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

shigatoxin_2_elisa
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

stool_texture
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

anti_PD_1
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

ajcc
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

smoke
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

bristol_score
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

hsCRP
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

LDL
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

mgs_richness
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

ferm_milk_prod_consumer
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

inr
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

birth_control_pil
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

c_section_type
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

hla_drb12
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

hla_dqa12
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

hla_dqa11
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

hla_drb11
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

zigosity
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

brinkman_index
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

alcohol_numeric
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

breastfeeding_duration
Real number (ℝ)

High correlation  Missing 

Distinct29
Distinct (%)21.3%
Missing571
Missing (%)80.8%
Infinite0
Infinite (%)0.0%
Mean357.47794
Minimum108
Maximum735
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.7 KiB
2025-03-29T21:13:00.247553image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum108
5-th percentile110
Q1276.25
median365
Q3411
95-th percentile699
Maximum735
Range627
Interquartile range (IQR)134.75

Descriptive statistics

Standard deviation138.14951
Coefficient of variation (CV)0.38645605
Kurtosis1.1680694
Mean357.47794
Median Absolute Deviation (MAD)69
Skewness0.68877722
Sum48617
Variance19085.288
MonotonicityNot monotonic
2025-03-29T21:13:00.317173image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=29)
ValueCountFrequency (%)
365 10
 
1.4%
326 5
 
0.7%
488 5
 
0.7%
699 5
 
0.7%
331 5
 
0.7%
408 5
 
0.7%
385 5
 
0.7%
376 5
 
0.7%
305 5
 
0.7%
246 5
 
0.7%
Other values (19) 81
 
11.5%
(Missing) 571
80.8%
ValueCountFrequency (%)
108 4
0.6%
110 4
0.6%
147 4
0.6%
182 4
0.6%
217 4
0.6%
246 5
0.7%
265 4
0.6%
268 5
0.7%
279 5
0.7%
296 5
0.7%
ValueCountFrequency (%)
735 4
0.6%
699 5
0.7%
488 5
0.7%
487 5
0.7%
486 4
0.6%
456 5
0.7%
419 4
0.6%
411 4
0.6%
409 4
0.6%
408 5
0.7%

formula_first_day
Real number (ℝ)

High correlation  Missing 

Distinct22
Distinct (%)14.5%
Missing555
Missing (%)78.5%
Infinite0
Infinite (%)0.0%
Mean101.14474
Minimum1
Maximum252
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.7 KiB
2025-03-29T21:13:00.384481image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q13
median101
Q3184
95-th percentile227
Maximum252
Range251
Interquartile range (IQR)181

Descriptive statistics

Standard deviation90.324683
Coefficient of variation (CV)0.89302406
Kurtosis-1.7662388
Mean101.14474
Median Absolute Deviation (MAD)94
Skewness0.017678414
Sum15374
Variance8158.5484
MonotonicityNot monotonic
2025-03-29T21:13:00.447380image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=22)
ValueCountFrequency (%)
2 26
 
3.7%
3 22
 
3.1%
184 14
 
2.0%
183 9
 
1.3%
199 5
 
0.7%
172 5
 
0.7%
16 5
 
0.7%
194 5
 
0.7%
175 5
 
0.7%
186 5
 
0.7%
Other values (12) 51
 
7.2%
(Missing) 555
78.5%
ValueCountFrequency (%)
1 5
 
0.7%
2 26
3.7%
3 22
3.1%
4 4
 
0.6%
6 1
 
0.1%
16 5
 
0.7%
47 5
 
0.7%
69 4
 
0.6%
101 5
 
0.7%
139 3
 
0.4%
ValueCountFrequency (%)
252 5
 
0.7%
227 4
 
0.6%
199 5
 
0.7%
195 5
 
0.7%
194 5
 
0.7%
186 5
 
0.7%
184 14
2.0%
183 9
1.3%
182 5
 
0.7%
175 5
 
0.7%

ALT
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

eGFR
Unsupported

Missing  Rejected  Unsupported 

Missing707
Missing (%)100.0%
Memory size5.7 KiB

Interactions

2025-03-29T21:12:53.344021image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:48.082255image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:48.598286image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:49.191323image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:49.784022image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:50.389889image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:50.915690image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:51.477968image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:52.200662image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:52.780322image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:53.389703image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:48.140908image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:48.651188image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:49.244041image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:49.840802image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:50.443121image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:50.963083image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:51.697450image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:52.246968image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:52.838947image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:53.449894image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:48.192508image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:48.715475image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:49.309750image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:49.905965image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:50.496433image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:51.025444image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:51.754959image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:52.304340image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:52.893629image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:53.504192image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:48.243283image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:48.780855image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:49.374287image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:49.972818image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:50.552798image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:51.085811image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:51.812683image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:52.359878image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:52.951095image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:53.561360image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:48.301078image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:48.848804image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:49.441492image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:50.036173image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:50.614900image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:51.148321image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:51.871112image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:52.417815image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:53.008546image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:53.608145image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:48.352840image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:48.902551image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:49.496250image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:50.096819image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:50.674787image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:51.197013image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:51.917963image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:52.463667image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:53.056913image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:53.660433image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:48.401969image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:48.964387image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:49.557707image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:50.157568image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:50.724348image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:51.260690image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:51.972233image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:52.517713image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:53.112729image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:53.717712image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:48.450209image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:49.022508image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:49.615007image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:50.218367image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:50.771751image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:51.315786image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:52.027197image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:52.572807image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:53.171260image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:53.771208image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:48.496752image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:49.078235image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:49.671328image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:50.273939image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:50.818555image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:51.369304image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:52.083501image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:52.656958image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:53.228761image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:53.831445image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:48.545217image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:49.134562image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:49.727428image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:50.333137image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:50.866369image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:51.425998image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:52.141873image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:52.727028image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-03-29T21:12:53.285429image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Correlations

2025-03-29T21:13:00.523429image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
BMIDNA_extraction_kitPMIDageage_categorybirth_weightborn_methodbreastfeeding_durationcountrycuratordays_from_first_collectiondiseasedisease_subtypefamily_rolefeeding_practicefobtformula_first_daygendergestational_ageinfant_agelocationmedian_read_lengthminimum_read_lengthnumber_basesnumber_readspregnantstudy_conditionstudy_name
BMI1.0001.0001.000-0.1810.000NaN0.000NaN0.0001.000NaN0.0830.0000.0000.0000.275NaN0.277NaN0.0000.1590.000NaN-0.145-0.1450.0000.0831.000
DNA_extraction_kit1.0001.0001.0001.0000.5851.0001.0001.0000.9991.0001.0000.8911.0001.0001.0001.0001.0000.1921.0001.0001.0000.7090.9270.5980.6361.0000.8911.000
PMID1.0001.0001.0001.0000.5851.0001.0001.0000.9991.0001.0000.8911.0001.0001.0001.0001.0000.1921.0001.0001.0000.7090.9270.5980.6361.0000.8911.000
age-0.1811.0001.0001.0000.868NaN0.000NaN0.3981.000NaN0.2030.0000.0000.0000.069NaN0.000NaN0.0000.1550.000NaN0.0060.0050.0000.2031.000
age_category0.0000.5850.5850.8681.0001.0001.0001.0000.6190.5850.7840.5531.0000.9921.0000.0001.0000.1581.0001.0000.2860.3720.5250.3230.3210.4820.5530.585
birth_weightNaN1.0001.000NaN1.0001.0000.353-0.1761.0001.0000.0091.0000.0001.0000.5560.0000.1280.5990.3900.0000.0000.282-0.0560.0150.0121.0001.0001.000
born_method0.0001.0001.0000.0001.0000.3531.0000.4831.0001.0000.0001.0000.0001.0000.2000.0000.3430.1120.4700.0000.0000.0000.0000.0000.0001.0001.0001.000
breastfeeding_durationNaN1.0001.000NaN1.000-0.1760.4831.0001.0001.000-0.0301.0000.0001.0000.4520.0000.4060.448-0.0490.0000.0000.000-0.0890.0790.0781.0001.0001.000
country0.0000.9990.9990.3980.6191.0001.0001.0001.0000.9991.0000.8321.0001.0001.0000.0001.0000.1881.0001.0000.9870.5790.7550.4950.5181.0000.8320.999
curator1.0001.0001.0001.0000.5851.0001.0001.0000.9991.0001.0000.8911.0001.0001.0001.0001.0000.1921.0001.0001.0000.7090.9270.5980.6361.0000.8911.000
days_from_first_collectionNaN1.0001.000NaN0.7840.0090.000-0.0301.0001.0001.0001.0000.0000.7840.0000.000-0.0280.364-0.0230.9970.0000.000-0.0010.0770.0770.5401.0001.000
disease0.0830.8910.8910.2030.5531.0001.0001.0000.8320.8911.0001.0001.0001.0001.0000.3751.0000.2071.0001.0000.5360.4660.6640.4620.4631.0001.0000.891
disease_subtype0.0001.0001.0000.0001.0000.0000.0000.0001.0001.0000.0001.0001.0000.0000.0000.0000.0000.2140.0000.0000.0001.0001.0000.0000.0000.0001.0001.000
family_role0.0001.0001.0000.0000.9921.0001.0001.0001.0001.0000.7841.0000.0001.0001.0000.0001.0000.4941.0001.0000.0000.0000.1090.2400.2370.4821.0001.000
feeding_practice0.0001.0001.0000.0001.0000.5560.2000.4521.0001.0000.0001.0000.0001.0001.0000.0000.9800.0000.4020.0000.0000.0000.0000.3270.3031.0001.0001.000
fobt0.2751.0001.0000.0690.0000.0000.0000.0000.0001.0000.0000.3750.0000.0000.0001.0000.0000.2070.0000.0000.0000.0001.0000.0000.0000.0000.3751.000
formula_first_dayNaN1.0001.000NaN1.0000.1280.3430.4061.0001.000-0.0281.0000.0001.0000.9800.0001.0000.3330.0590.0000.0000.2760.1300.1290.1301.0001.0001.000
gender0.2770.1920.1920.0000.1580.5990.1120.4480.1880.1920.3640.2070.2140.4940.0000.2070.3331.0000.4530.0000.0000.1590.1950.2170.2190.2310.2070.192
gestational_ageNaN1.0001.000NaN1.0000.3900.470-0.0491.0001.000-0.0231.0000.0001.0000.4020.0000.0590.4531.0000.0000.0000.195-0.0570.0340.0321.0001.0001.000
infant_age0.0001.0001.0000.0001.0000.0000.0000.0001.0001.0000.9971.0000.0001.0000.0000.0000.0000.0000.0001.0000.0000.0610.1190.3000.2931.0001.0001.000
location0.1591.0001.0000.1550.2860.0000.0000.0000.9871.0000.0000.5360.0000.0000.0000.0000.0000.0000.0000.0001.0000.0001.0000.1670.1230.0000.5361.000
median_read_length0.0000.7090.7090.0000.3720.2820.0000.0000.5790.7090.0000.4661.0000.0000.0000.0000.2760.1590.1950.0610.0001.0000.6150.3240.3530.0000.4660.709
minimum_read_lengthNaN0.9270.927NaN0.525-0.0560.000-0.0890.7550.927-0.0010.6641.0000.1090.0001.0000.1300.195-0.0570.1191.0000.6151.000-0.623-0.6280.0000.6640.927
number_bases-0.1450.5980.5980.0060.3230.0150.0000.0790.4950.5980.0770.4620.0000.2400.3270.0000.1290.2170.0340.3000.1670.324-0.6231.0000.9980.1260.4620.598
number_reads-0.1450.6360.6360.0050.3210.0120.0000.0780.5180.6360.0770.4630.0000.2370.3030.0000.1300.2190.0320.2930.1230.353-0.6280.9981.0000.1330.4630.636
pregnant0.0001.0001.0000.0000.4821.0001.0001.0001.0001.0000.5401.0000.0000.4821.0000.0001.0000.2311.0001.0000.0000.0000.0000.1260.1331.0001.0001.000
study_condition0.0830.8910.8910.2030.5531.0001.0001.0000.8320.8911.0001.0001.0001.0001.0000.3751.0000.2071.0001.0000.5360.4660.6640.4620.4631.0001.0000.891
study_name1.0001.0001.0001.0000.5851.0001.0001.0000.9991.0001.0000.8911.0001.0001.0001.0001.0000.1921.0001.0001.0000.7090.9270.5980.6361.0000.8911.000

Missing values

2025-03-29T21:12:54.188090image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.
2025-03-29T21:12:54.755759image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2025-03-29T21:12:55.322782image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

study_namesample_idsubject_idbody_siteantibiotics_current_usestudy_conditiondiseaseageinfant_ageage_categorygendercountrynon_westernizedsequencing_platformDNA_extraction_kitPMIDnumber_readsnumber_basesminimum_read_lengthmedian_read_lengthNCBI_accessionpregnantlactatingcuratorBMIfamilytreatmentdays_from_first_collectionfamily_roleborn_methodfeeding_practicelocationdiettravel_destinationvisit_numberprematurebirth_weightgestational_ageantibiotics_familydisease_subtypedays_after_onsetcreatinealbuminehscrpESRastaltglobulinurea_nitrogenBASDAIBASFIalcoholflg_genotypepopulationmenopausal_statuslifestylebody_subsiteuncurated_metadatatnmtriglycerideshdlldlhba1cchange_in_tumor_sizeRECISTORRsmokerever_smokerdental_sample_typehistory_of_periodontitisPPD_MPPD_BPPD_DPPD_Lfobtdisease_stagedisease_locationcalprotectinHBISCCAImumpscholesterolc_peptideglucosecreatininebilubirinprothrombin_timewbcrbchemoglobinometryFMT_rolesubcohortfmt_idremissiondyastolic_psystolic_pinsulin_catadiponectinglp_1cd163il_1leptinfgf_19glutamate_decarboxylase_2_antibodyHLAautoantibody_positiveage_seroconversionage_T1D_diagnosishitchip_probe_classprevious_therapyperformance_statustoxicity_above_zeroPFS12fasting_insulinfasting_glucoseprotein_intakestec_countshigatoxin_2_elisastool_textureanti_PD_1ajccsmokebristol_scorehsCRPLDLmgs_richnessferm_milk_prod_consumerinrbirth_control_pilc_section_typehla_drb12hla_dqa12hla_dqa11hla_drb11zigositybrinkman_indexalcohol_numericbreastfeeding_durationformula_first_dayALTeGFR
0HanniganGD_2017MG100208HanniganGD_2017_A29stoolnoadenomaadenoma45.0NaNadultfemaleCANnoIlluminaHiSeqMoBio30459201613335076333605175126SRR5665080NaNNaNPaolo_Manghi31.626276NaNNaNNaNNaNNaNNaNTorontoNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNnoNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
1HanniganGD_2017MG100207HanniganGD_2017_A28stoolnoadenomaadenoma50.0NaNadultmaleCANnoIlluminaHiSeqMoBio304592019320348116163369075126SRR5665075NaNNaNPaolo_Manghi31.673469NaNNaNNaNNaNNaNNaNTorontoNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNnoNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
2HanniganGD_2017MG100206HanniganGD_2017_A27stoolnoadenomaadenoma68.0NaNseniormaleCANnoIlluminaHiSeqMoBio30459201634257078789715975126SRR5665074NaNNaNPaolo_Manghi25.216253NaNNaNNaNNaNNaNNaNTorontoNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNnoNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
3HanniganGD_2017MG100205HanniganGD_2017_A26stoolnoadenomaadenoma80.0NaNseniorfemaleCANnoIlluminaHiSeqMoBio3045920112551662156237431975126SRR5665073NaNNaNPaolo_Manghi28.719723NaNNaNNaNNaNNaNNaNTorontoNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNnoNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
4HanniganGD_2017MG100204HanniganGD_2017_A25stoolnoadenomaadenoma63.0NaNadultfemaleCANnoIlluminaHiSeqMoBio3045920115176232188368221975126SRR5665072NaNNaNPaolo_Manghi27.335640NaNNaNNaNNaNNaNNaNTorontoNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNnoNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
5HanniganGD_2017MG100203HanniganGD_2017_A24stoolnoadenomaadenoma67.0NaNseniorfemaleCANnoIlluminaHiSeqMoBio304592018180768101772116475126SRR5665079NaNNaNPaolo_Manghi25.558846NaNNaNNaNNaNNaNNaNTorontoNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNnoNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
6HanniganGD_2017MG100202HanniganGD_2017_A23stoolnoadenomaadenoma64.0NaNadultmaleCANnoIlluminaHiSeqMoBio30459201550229268260428675126SRR5665078NaNNaNPaolo_Manghi25.057360NaNNaNNaNNaNNaNNaNTorontoNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNnoNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
7HanniganGD_2017MG100201HanniganGD_2017_A22stoolnoadenomaadenoma68.0NaNseniorfemaleCANnoIlluminaHiSeqMoBio30459201744761692504286575126SRR5665077NaNNaNPaolo_Manghi31.588613NaNNaNNaNNaNNaNNaNTorontoNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNnoNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
8HanniganGD_2017MG100200HanniganGD_2017_A21stoolnoadenomaadenoma50.0NaNadultfemaleCANnoIlluminaHiSeqMoBio30459201267327433195533475125SRR5665076NaNNaNPaolo_Manghi23.828125NaNNaNNaNNaNNaNNaNTorontoNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNnoNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
9HanniganGD_2017MG100199HanniganGD_2017_A20stoolnoadenomaadenoma47.0NaNadultfemaleUSAnoIlluminaHiSeqMoBio30459201417230052024898175126SRR5665134NaNNaNPaolo_Manghi24.221453NaNNaNNaNNaNNaNNaNAnnArborNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNnoNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
study_namesample_idsubject_idbody_siteantibiotics_current_usestudy_conditiondiseaseageinfant_ageage_categorygendercountrynon_westernizedsequencing_platformDNA_extraction_kitPMIDnumber_readsnumber_basesminimum_read_lengthmedian_read_lengthNCBI_accessionpregnantlactatingcuratorBMIfamilytreatmentdays_from_first_collectionfamily_roleborn_methodfeeding_practicelocationdiettravel_destinationvisit_numberprematurebirth_weightgestational_ageantibiotics_familydisease_subtypedays_after_onsetcreatinealbuminehscrpESRastaltglobulinurea_nitrogenBASDAIBASFIalcoholflg_genotypepopulationmenopausal_statuslifestylebody_subsiteuncurated_metadatatnmtriglycerideshdlldlhba1cchange_in_tumor_sizeRECISTORRsmokerever_smokerdental_sample_typehistory_of_periodontitisPPD_MPPD_BPPD_DPPD_Lfobtdisease_stagedisease_locationcalprotectinHBISCCAImumpscholesterolc_peptideglucosecreatininebilubirinprothrombin_timewbcrbchemoglobinometryFMT_rolesubcohortfmt_idremissiondyastolic_psystolic_pinsulin_catadiponectinglp_1cd163il_1leptinfgf_19glutamate_decarboxylase_2_antibodyHLAautoantibody_positiveage_seroconversionage_T1D_diagnosishitchip_probe_classprevious_therapyperformance_statustoxicity_above_zeroPFS12fasting_insulinfasting_glucoseprotein_intakestec_countshigatoxin_2_elisastool_textureanti_PD_1ajccsmokebristol_scorehsCRPLDLmgs_richnessferm_milk_prod_consumerinrbirth_control_pilc_section_typehla_drb12hla_dqa12hla_dqa11hla_drb11zigositybrinkman_indexalcohol_numericbreastfeeding_durationformula_first_dayALTeGFR
697YassourM_2018G102213M0038MstoolNaNcontrolhealthyNaNNaNadultfemaleFINnoIlluminaHiSeqPowerSoil3000151734497674340540056757101SRR7281035noNaNMarisa_MetzgerNaNYassourM_2018_M0038MNaN167.0motherNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
698YassourM_2018G104686M0053CstoolNaNcontrolhealthyNaN0.0newbornmaleFINnoIlluminaHiSeqPowerSoil300015179513249444246650101SRR7281036noNaNMarisa_MetzgerNaNYassourM_2018_M0053CNaN0.0childvaginalNaNNaNNaNNaNNaNNaN3120.041.4NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
699YassourM_2018G102217M0038CstoolNaNcontrolhealthyNaN60.0newbornmaleFINnoIlluminaHiSeqPowerSoil3000151733034210325772375660101SRR7281038noNaNMarisa_MetzgerNaNYassourM_2018_M0038CNaN60.0childvaginalmixed_feedingNaNNaNNaNNaNNaN3620.039.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN246.02.0NaNNaN
700YassourM_2018G102218M0038CstoolNaNcontrolhealthyNaN90.0newbornmaleFINnoIlluminaHiSeqPowerSoil3000151742100416416046776960101SRR7281039noNaNMarisa_MetzgerNaNYassourM_2018_M0038CNaN90.0childvaginalmixed_feedingNaNNaNNaNNaNNaN3620.039.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN246.02.0NaNNaN
701YassourM_2018G102211M0038MstoolNaNcontrolhealthyNaNNaNadultfemaleFINnoIlluminaHiSeqPowerSoil3000151739973032394882277660101SRR7281040yesNaNMarisa_MetzgerNaNYassourM_2018_M0038MNaN0.0motherNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
702YassourM_2018G102212M0038MstoolNaNcontrolhealthyNaNNaNadultfemaleFINnoIlluminaHiSeqPowerSoil3000151722249756218905765676100SRR7281041noNaNMarisa_MetzgerNaNYassourM_2018_M0038MNaN77.0motherNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
703YassourM_2018G104681M0024MstoolNaNcontrolhealthyNaNNaNadultfemaleFINnoIlluminaHiSeqPowerSoil3000151761282548609025868750101SRR7281042yesNaNMarisa_MetzgerNaNYassourM_2018_M0024MNaN0.0motherNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
704YassourM_2018G102214M0038CstoolNaNcontrolhealthyNaN0.0newbornmaleFINnoIlluminaHiSeqPowerSoil30001517932136691570986164101SRR7281043noNaNMarisa_MetzgerNaNYassourM_2018_M0038CNaN0.0childvaginalmixed_feedingNaNNaNNaNNaNNaN3620.039.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN246.02.0NaNNaN
705YassourM_2018G102215M0038CstoolNaNcontrolhealthyNaN14.0newbornmaleFINnoIlluminaHiSeqPowerSoil3000151725838128254720270263101SRR7281044noNaNMarisa_MetzgerNaNYassourM_2018_M0038CNaN14.0childvaginalmixed_feedingNaNNaNNaNNaNNaN3620.039.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN246.02.0NaNNaN
706YassourM_2018G102216M0038CstoolNaNcontrolhealthyNaN30.0newbornmaleFINnoIlluminaHiSeqPowerSoil3000151733162504326420257664101SRR7281045noNaNMarisa_MetzgerNaNYassourM_2018_M0038CNaN30.0childvaginalmixed_feedingNaNNaNNaNNaNNaN3620.039.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN246.02.0NaNNaN